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Abstract. This paper considers the nonparametric regression model with an additive 
error that is dependent on the explanatory variables. As is common in empirical studies 
in epidemiology and economics, it also supposes that valid instrumental variables are 
observed. A classical example in microeconomics considers the consumer demand 
function as a function of the price of goods and the income, both variables often 
considered as endogenous. In this framework, the economic theory also imposes shape 
restrictions on the demand function, like integrability conditions. Motivated by this 
illustration in microeconomics, we study an estimator of a nonparametric constrained 
regression function using instrumental variables by means of Tikhonov regularization. 
We derive rates of convergence for the regularized model both in a deterministic and 
stochastic setting under the assumption that the true regression function satisfies a 
projected source condition including, because of the non-convexity of the imposed 
constraints, an additional smallness condition. 
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1. Motivation 

We consider the model 

Y i = g(X i )+e i , i = l,...,n, 

where (Yi, Xi)i=\ } ___ jn is a sample of observations of size n representing respectively the 
measured data and variables effecting the measurements. The function g describes the 
dependence of the data on the variables, and £j is a combination of noise (measurement 
errors) and modeling errors, often resulting from the omittance of relevant variables. 
The goal is the estimation of the function g. If the modeling errors e and the variables 
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X are not dependent, that is, if the conditional expectation E(e|X) of e given X is zero, 
then it is possible to identify g by 



If, however, the conditional expectation of e given X does not vanish, then this will lead 
to a biased estimate, as 



The variables X are then called endogenous variables. This issue of endogeneity typically 
arises in the presence of modeling errors, in particular, if variables have been omitted 
from the model that simultaneously influence both X and Y . This has been illustrated 
in several applications, for example in epidemiology (see [TTJ [HJ 128] ) and in economics 
(see [31] and also the survey [2]). In the classical microeconomic setting of consumer 
demand, the endogeneity issue has also been raised. In this framework, the variable Y 
represents the observed demand of a consumer for k goods, and the explanatory variables 
X include the vector of prices P of the goods and the total budget Z > of the consumer; 
the function g: IR^ x IR >0 — > R> denotes the consumer demand. The problem of price 
endogeneity has been highlighted in several research articles (see for example [81I2T1I22]). 
In an industrial organization framework, the paper by [I] analyzes demand and supply 
in differentiated product markets (like the US automobile industry) and highlight the 
problem involved by correlation between prices and product characteristics, some of 
which are observed by the consumer but not by the econometrician. Similarly total 
expenditure endogeneity has been studied in particular for Engel Curves analysis, see 
for example [7J. 

One remedy is the usage of instruments, that is, different variables W, which 
influence both P and Z but are uncorrelated with e (see [2] for an overview). The 
analysis of nonparametric instrumental regression has been conducted in several works 
such as [IHl [TFl [THl [27]. Therefore we consider the model 



and we assume that the random variable X = (P, Z) is described by instruments W in 
such a way that E(e|W) = 0. Therefore, the equation ([!]) can be transformed into 



We assume in the following that the relation between Y, X and W is described by 
a joint density fyxw'- x Qx x Q\v ~ > ^>o> where, for simplicity, the finite measure 
spaces fir, Qx and flw are assumed to be normalized. We consider L 2 spaces with 
respect to this joint probability density and denote for example by L 2 (Qx) functions 
depending on P and Z only. In addition, we denote by fyw-i fxwi fw the corresponding 
marginal densities defined by 



g(x) := E(Y\X = x). 



(1) 



E(Y\X = x)= g{x) + E(e|X = x). 



Y = g(X)+e 



E(g(X)\W = w)= E{Y\W = w). 



(2) 
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f xw (x,w) = f Y xw(y,x,w)dy, 
fw{w) = / f Y xw(y,x,w)dxdy. 



Now assume that the set fl is bounded and fyxw is bounded away from zero. We 
consider the operator T: L 2 {Vtx) —> L 2 (Qw) defined by 

Ttfj(w) :=E(t/j(X)\W = w) = [ tfj(x)^p^p-dx. (3) 

Jq x Jw\w) 

Then (j2J) can be rewritten as the Fredholm integral equation 

Tg = h, (4) 



where 

h{ w) = ny\w = W )= ! y^f^dy. 

Jn Y Jw{w) 

In addition, classical microeconomic theory imposes some shape restrictions on the 
consumer demand, and the challenge is to take these constraints into account in the 
nonparametric estimation of the function g. More precisely, standard micro-economic 
theory (see states that the demand is the result of the maximization of some 
(unknown) utility function. That is, there exists some function u: IR> — > R (the utility) 
such that 

g(x) = argmax{w(?/) : y E M> , (y,p) < z}, (5) 

where x = (p, z). Here the utility function is assumed to be continuously differentiable, 
concave, and strictly monotoneously increasing. Even though the utility is unknown, 
the assumption of its existence (and of utility maximization) has some implications for 
the demand function g, called the integrability conditions. First, it is rather obvious 
that g is homogeneous of degree 0, that is, g(tx) = g(x) for every t > 0. Moreover, 
the maximum in (JHJ) is always attained at the boundary; more precisely, we have the 
equality 

{y,g{x)) = z\ (6) 

this condition is usually called the budget constraint. Finally, defining the Slutsky matrix 
S g (x) := V p g(x) + d z g(x) ■ g(x) T , 

the conditions 

S g (x) = S g (x) T and S g (x) < (7) 

hold. That is, the Slutsky matrix is symmetric and negative semi-definite in (almost) 
every point x = (p, z). 

Therefore, the objective of this work is to recover the function g characterized by 
equation (@| and satisfying the constraints defined by the Slutsky matrix. 

The paper is organized as follows: In Section 2, we present our model, the link 
with ill-posed inverse problems in the case where the transform is unknown, and the 
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conditions under which a regularized solution can be defined. In Section 3 we derive 
rates of convergence in a deterministic setting and we extend the results in Section 4 to 
the statistical setting. 

2. Constrained Inversion of T 

Let now T be the operator defined in ([3]) (operating on vector valued functions). Then, 
in order to recover g, we have to solve the equation 

Tg = h, 

where h denotes the right hand side of PJ subject to the constraints that g is 
homogeneous of degree and satisfies the budget constraint (JSJ) and the Slutsky 
condition (JTj) almost everywhere in Q := Qx = fip x Qz- In the following we will always 
assume that the set Q is bounded, open, connected and has a Lipschitz boundary. 

Apart from the constraints, there are three problems: First, the operator T is 
defined by the density fxw, which is not known exactly but can only by estimated up 
to a certain error 5. Consequently, we will only have an approximation T s of T available. 
Second, the right hand side h is only known up to some error 7, as it may be prone to 
measurement errors (in a deterministic setting) or is the realization of a random variable 
(in a stochastic setting), and, again, it depends on the density fyw- In addition, the 
assumption E(e|iw) = need not hold exactly. Finally, the operator T (and also its 
approximation T s ) is not boundedly invertible in L 2 (fl;M, k ). Thus a direct solution of 
the operator equation 

T s g = h~< 

does not make sense, as its solution g 6 ' 1 (if it exists) need not be close to the true solution 
g\ even if the errors S and 7 are small. In addition, there is no reason why the exact 
solution of the perturbed operator equation (if it exists) should satisfy the required 
constraints, in particular, as the constraints are non-linear and describe a non-convex 
set. 

In order to find a solution nevertheless, it is necessary to consider some kind of 
regularized solution. In the following, we consider the application of (constrained) 
Tikhonov regularization, where we use the (weighted) first order Sobolev norm as 
regularization functional. That is, denoting for fi > by 

\\g\\l-=Mh + \\Vg\\h (8) 

the weighted Sobolev norm, one minimizes, for some regularization parameter a > 
depending on S and 7, the functional 

T a (g;T s ,h~<) := \\T s g-hi\\l 2 + a\\g\\l 

subject to the constraints of positivity, O-homogeneity, the Slutsky condition, and the 
budget constraint. For the sake of simplicity, we will omit in the following the subscripts 
in the L 2 -norms and we will assume that Q is compactly contained in M> x M>o- 



Nonparametric instrumental regression with non- convex constraints 



5 



We use in the following the abbreviation 

X := {g G fT^fijRk) : g > is O-homogeneous, (p,g{x)) = z and 5 g = Sj < a.e.}. 

Then one can define 

gtT := argmin{||T^ - h''\\ 2 + a\\g\\l : g E X}, 

provided the Tikhonov functional attains its minimum in X. In the following, we will 
show that this is indeed the case. The proof is based on the direct method in the 
calculus of variations. As a first important result, we prove that the set X is weakly 
closed in H l (fl;R k ), which is not an obvious assertion, as X is non-convex, and the 
weak closedness of a subset of a Hilbert space is usually strongly tied to its convexity. 

Lemma 2.1. The set X is weakly sequentially closed in if x (f2; R k ). 

Proof. Obviously the set of non-negative O-homogeneous functions satisfying the budget 
constraint (p, g(x)) = z is convex and closed in H X {^1\ M. k ), implying that it is also weakly 
closed. 

Next we show that the mapping S: H 1 ^; R k ) -> L 1 ^; R kxk ), 

g ^ S(g) = W p g + d z g ■ g T 

is weak-weak continuous. To that end assume that the sequence {g n )n<m weakly 
converges to g G H l {VL; R k ). Then V ' g n weakly converges to Vg in L 2 (f2; K fcx ( fe+1 )) 
(which in particular implies that the sequence is bounded) and the Rellich-Kondrachov 
compactness theorem (see [TJ Thm. 6.2]) implies that the functions g n converge strongly 
to g with respect to the L 2 topology. Thus, if 1 < i, j < k and u G L 2 (Q; R), we have 

\{d z g®g^ - d M g®g<*\u)\ < \{d,g®is<i> - g®),u)\ + \{{d,gg> - d z g^\u)\ 

< W ~ Q^WhWzg^W + \(d,g®-d a g®,g®u)\ 0. 

Consequently the product d z g n -g^ converges to d z g-g T with respect to the weak topology 
on L 1 (Q;R kxk ). 

Now note that the set Sym^ of all symmetric and negative semi-definite (k x k)- 
matrices is a closed and convex cone in M. kxk . Consequently also the set of all summable 
functions on Q with values in Sym^ is a closed and convex cone in L l (Q; R kxk ) and 
therefore, in particular, also weakly closed. Therefore the weak-weak continuity of 
the mapping S implies that the set of functions g G iJ 1 (S7; M. fe ) satisfying the Slutsky 
condition S(g) = S(g) T < is weakly closed. 

This shows that the set X is the intersection of the (weakly closed) set of 0- 
homogeneous, non-negative functions satisfying the budget constraint with a weakly 
closed set, which proves that X itself is weakly closed in if 1 (£7; R k ). □ 

For the usage of the direct method in the calculus of variations, we still have to 
prove the coercivity of the regularization functional. In the case [i > 0, the coercivity 
is obvious, as the regularization term is equivalent to the if 1 -norm; in the case \x = 0, 
however, the equivalence only holds, if the operator T does not annihilate constant 
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functions (see (21 ED] for a related result on total variation regularization). In the next 
result, we provide a detailed proof of this assertion by explicitly computing constants 
defining this equivalence of norms. In particular, the results show that these constants 
depend continuously on the operator T, which will be required in the proof of the 
convergence result, where we also treat the case of operator errors. 

Lemma 2.2. Assume that T: L 2 (Q;M. k ) — > L 2 (fi;IR fe ) is a bounded linear operator. If 
fi = 0, assume in addition that Tc ^ for every non- zero constant function c: Q — > R. k . 
Define for g e F^QjR*) 

\\g\\ 2 T--=\\9\\l+\\Tg\\h. (9) 

Then || ■ \\t is a norm on H 1 (Q;M. k ) that is equivalent to the standard iP-norm. More 
precisely, we have the following estimates: For every fi > 0, 

Nlr< \\T\\\\g\\m; (10) 
if fi > in (El), then 

\\g\\m < ■ } r- Tt IMIt, (11) 

1} 

and if /x = in (j5J), then there exists a constant A > only depending on the set Q such 
that 

\\g\\ m < AdlTWDiT)- 1 + D{T)- 1 + 1)||^|| T , (12) 

where 

D(T) := inf{||Tc|| : c: fi ->■ M fc is constant with |c| = 1}. 
Proof. Inequality f flUj) follows from 

Ibllr < II^IIlz < ll^llllfi , l|L2 < ||T||||p|| H i, 
and ( ITT]) is trivial. 

Now assume that /i = 0. Then the assertion Tc 7^ for every non-zero constant 
function c: fi — > M. k implies that < D(T) < +00. Define now the projection 
P:L 2 (fi;M fc ) L 2 (tt;R k ), g m- p/ n p. Then 

Nl a = llff-^ll a + lW 

< ||s-Ps|| 2 + D(T)- 2 ||TPs|| 2 

< |b - Ppf + 2 J D(T)- 2 (||Ts|| 2 + \\T(g - Pg)f) 

< (1 + 2D(T)- 2 \\T\\ 2 )\\g - Pg\\ 2 + 2D(T)" 2 ||T^|| 2 . 

From the Poincare Inequality (see e.g. (32J Thm. 4.8.1]) it follows that there exists C > 
such that \\g - Pg\\ < C\\Vg\\. Thus 

\\9\\ 2 H i = \\9\\ 2 + \\Vg\\ 2 

< (C 2 (l + 2D(T)- 2 ||T|| 2 ) + 1)|| V^f + 2D(T)- 2 ||Ts|| 2 . 

Setting A = 2(C + 1) we obtain (Q2J. □ 
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Lemma 2.3. Assume that T s : L 2 (Q;R k ) -> L 2 (f2;R fc ) is bounded linear, W E 
L 2 (n-,R k ), a > 0, and y, > 0. If /i = 0, assume in addition that T s c ^ for 
every non- zero constant function c: Q — > R k . Then the regularization functional 
T a - L 2 (VL; R k ) -> M> U {+cx)}. 

" ^_^7||2 + a y|2 jf^eAT, 
-oo else, 



T a (g;T\hi): 
attains its minimum. 



Proof. The weak closedness of the set X and the weak lower semi-continuity of the 
mapping g i-)- ^HT^p — /i 7 || 2 + f H^H^ on the space i7 1 (f2) imply that also the mapping 
7~ a (-; T s , h 1 ) is weakly lower semi- continuous. Moreover, Lemma 12.21 implies that 
T a (-; T s , h 1 ) is weakly coercive. Applying the direct method in the calculus of variations, 
we obtain the existence of a minimizer. □ 

Note that the previous result does not say anything about the uniqueness of the 
minimizer. Because of the non-convexity of the set X, it is probable that the Tikhonov 
functional has multiple local minima, but also possible that it has several global minima. 

The following result is very similar to the convergence result in [25]. The 
main difference is that we also consider the homogeneous Sobolev semi-norm as a 
regularization term, which is not coercive by itself. The coercivity (or rather the equi- 
coercivity of the functionals T a (-; T s , /i 7 )/a) is only obtained by means of Lemma [2.21 

Proposition 2.4. Assume that T: L 2 (Q;R k ) — > L 2 (Q;M. k ) is bounded linear satisfying 
Tc for every non-zero constant function c: Q — > M. k and that the operator equation 
Tg = h has a solution in X. Let Sj — » ; jj — > and assume that T 6j ' : L 2 (Q; M k ) — > 
L 2 (Q; R k ) are bounded linear operators satisfying \\T Sj — T\\ < Sj and that the functions 
fftj g L 2 (Q; R k ) satisfy ||/i 7j — h\\ < jj. Let \i > be fixed; if fi = 0, assume in addition 
that T^c^Ofor every non-zero constant function c: Q — >■ R . 

Assume that aj > is chosen such that aj — » and (Sj +jj) 2 /aj — > 0. Then every 
sequence (gj)jeN C X satisfying 

9j E arg min{7;, (g; T s \h^) : g E X} 
has a subsequence g^i) converging with respect to the H 1 -norm to some 

g ] E argmin{||si|| 2 :Tg = h, g E X}. 

Proof. Let g be any solution of Tg = h in X. Then 

P*i 9j - h^\\ 2 + a\\g 3 \\l < \\T 5 >~g - h^\\ 2 + a\\g\\l 

<(\\T^-T\\\\~g\\ + \\h-h^\\) 2 + a\\~g\\l 
< (Sj\\9\\ + Ijf + &\\g\\l- 



Consequently, 



1^-11^ <2||T^-^|| 2 + 2||^|| 2 +||^||; 

< (SM + 7,) 2 + <«; + 2||^ || 2 + ^g±^j! + 
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From Lemma [221 it follows that ||<7j|| T 4j > C(T Sj )\\gj\\ H i for some constants C(-) > 
depending continuously on the operator T Sj . Therefore, the assumption (5j+'jj) 2 /a — > 
implies that the sequence (gj)j G ^ is bounded. The proof of the subsequential convergence 
is now along the lines of [261 Thm. 3.26]. □ 



3. Convergence Rates 

Lemma 3.1. Assume that T: L 2 (Q;R k ) — > L 2 (fl;R k ) is bounded linear and that the 
equation Tg = h has a solution in X. Let 

g ] G argmin{||5(||J :Tg = h, g G X}. 

Let moreover T 5 : L 2 (fi;M fc ) L 2 (fi;M fc ) satisfy ||T 5 -T|| L 2 < 5, and let K 1 G L 2 (ft;M fc ) 
satisfy ||/i 7 — < 7. If // = 0, assume in addition that 5 < \\T\\ and Tc 7^ for every 
non-zero constant function c: f2 — >■ M fe . Assume that there exists a set L d X such that, 
for some > 0, C > and every g E L, we have 

^Ib-^II^NIJ-lb^ + ciir^-^ii. (is) 

Let moreover 

g 5 f e a,rgmm{T a (g;T s ,W) : g e X}. 
Define for fi > 

n , , x \\g% 

and let 



Do(a,5,7) := ^ j 



T|| + D(T) + 1 f + ^|bt|| + 7 + v^|| Vpt 



D(T)-S L ' min{Va,l} 
with A > and D(T) > as in Lemma E2J 
Then the estimates 

- g %< t±MM. + C(7 + «y Q , ^)) + ^ 

and 

\\T(g 5 ^ - g ] )\\ 2 < 2(7 + 6\\gHf + 2aC{ 1 + 8D,(a, 5, 7)) + CV 
hold whenever g 5 ^ 1 G L. 

Proof. The inequality ( TT31 and the optimality of g 5 ^ 1 imply that 
-g%< ~ \\9% + C\\T(g^ - gt)\\ 

< ^(||TV - V\\ 2 - \\T 5 g^ - W\\ 2 ) + C\\T(gfr - g 



< 

a 



+ C\\T s g s f -h~<\\ 



T s g s ^ -W\\ 2 



Nonparametric instrumental regression with non-convex constraints 
Estimating 



2 



C\\T s g 5 a ^ ^- < sup 



t 



2 

ct-- 

a 



C 2 a 



we obtain the inequality 

Phf ~9%< (7 + " lg '" t2)2 + C(, + %f ||) + ^. (14) 
^ a 4 

Moreover, using the estimate 

C\\T*& H T ^' 7 ~ fe7 H 2 

a 

t 2 i \\T 5 g s ^-Wf C 2 a \\T s g 5 ^-h^\\ 2 



< sup 

t>0 



Ct 

2a 



2a 2 2a 

we obtain 

\\T(ifc - g ] )\\ 2 < 2( 7 + S\\g%,) 2 + 2aC( 7 + 6\\gfr\\) + C 2 a 2 . (15) 

Assume first that /i > 0. Then the definition of || • || M and the optimality of g s a imply 
the estimate 

which proves the assertion for // strictly positive. 

Now assume that fi = 0. Then Lemma [2.21 implies that, using the same notation 
as in the Lemma, 

\\g s f\\ < AiWT'WDiT 6 )- 1 + D(T 5 y l + 1)||^|| T ,. 
Moreover, for ||T|| > 5, we have 

D(T 5 ) = inf{||T 5 c|| : c: Q -> M fc is constant with |c| = 1} 

> inf{||Tc|| : c: ft -> M fc is constant with |c| = 1} - 5 
= D(T) - 5. 

Therefore, 

ll^'l < ^((l|T|| + *)(Z>(T) - 5)- 1 + (£>(T) - 5)- 1 + 1)||^||t. 

1|T||+D(r) + 1 
" A D(T)-5 ll9allTS - 

Now the optimality of g^ implies that 

\\9 5 f\\T*<\\T s g 5 f- W\\ + ||V^|| + ||^|| 

(2T a (^;T^)) 1/2 
minj-y/a, 1} 

(2r Q (^ ; r^,^)) 1/2 

min{A/a, 1} 



< IN +7 

< INI+7 + 




< M+7 + vsM±i+^fM. 

min{A/a, 1} 



-D(T) — o L mm{y/a, 1} 



Together, these estimates show that 

Inserting this inequality in (I14p and ( TT5l) proves the assertion for \x = 0. □ 

In the next result, we will present concrete conditions that imply the inequality f fT3|) . 
These conditions are a generalization of projected source conditions, which are a classical 
concept in the theory of inverse problems with convex contraints (see [TOj [151 121]), to 
a non-convex setting. Recently, the relation between projected source conditions and 
variational inequalities of the type f lT3|) has also been studied in [IB] , though still in a 
convex setting. In order to generalize this concept to non-convex constraints, we recall 
the notion of a proximal normal cone to a subset of a Hilbert space (see |12j). 

Definition 3.2. Let Y be a Hilbert space and let S C Y be non-empty. We define for 
y G Y the set proj 5 (y) C S as the set of all points z G S for which the distance to y is 
minimal. Moreover we define for z G S the proximal normal cone Ng(z) to S at z as 

:= {C = t(y - z) G Y : t > 0, z G proj s (y)}. 

See also Figured) 

For the following result, see [T2J Prop. 1.5]. 

Proposition 3.3. A vector ( belongs to Ng(z), if and only if there exists r > (possibly 
depending on ( and z ) such that 

(C,y-z)<r\\y-z\\ 2 (16) 

for all y G S. 
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In the following we will denote, for given z G S and C e Ng(z), by r(£ z) the 
smallest r > for which ( j!6p holds. Then the function r is positively homogeneous 
with respect to its first variable, that is, r(££, 2) = tr((,z) whenever ( G Ng(z) and 
t > (note that the fact that Ng(z) is a cone implies that t( G Ng(z)). 

Theorem 3.4. Assume that g^ £ X satisfies Tg^ = h. In addition, assume that 
d v g ] = on dSl. Denote moreover by T*: L 2 (Vt;R k ) -»■ L 2 (n-,R k ) the adjoint of T 
and let N^{g^) C L 2 (Q; R k ) be the proximal normal cone to the set X at the point g*. 
Assume that there exist u G L 2 (Q;R k ) and ( G N^(g^) such that 

• If \i > and t(£, gt) < //, £/ien < f73j) /toZds /or every g G A 1 with C = \\u\\ and 
/3 = l-r(C,o t )/ /U . 

• If [i = 0, assume in addition that Tc 7^ for every non-zero constant function 
c: Q -> R k and that 

E := A 2 {\\T\\D{T)- 1 + D(T) -1 + l) 2 r(C, o f ) 

A and -D(T) as m Lemma \2.2\ satisfies E < 1. T/ien /or every s > t/ie 
inequality [W\l holds with j3 = 1 — E and C = \\u>\\ + sE whenever g G X satisfies 
\\T(g-g^\\<s. 

Proof. First note that 

(2/V - 2A^t -c,g ] -g) = (T*u, g ] - g) 

= {u,Ttf-g)) 

<M\\Ttf-g)\\. 

Now the assumption ( G N^(g^) implies that 

(C,^ - 9} < r(c^) ll^-sll 2 

for all j6 A". In addition, Stoke's theorem and the assumption d u g^ = on dfl imply 
that 

2</V - A^t, ^ _ g ) = 2 p(g\ g 1-g) + 2{Vg\ - g)) 

II t Il2 1 || t 1 1 2 || ||2 

= w -g\\n + \\9% - IMU- 

Thus we obtain the estimate 

Ml ||T0/ - g^W >\\g%-\\g\\l + \tf-g\\l- r(C, <7 f ) lb 1 - sll 2 - (17) 

In the case /i > 0, it follows that 

(i - t-cc^VaOII^ -g\\l< \\g% ~ Mil + Ml \\ T (g - <?% 

which proves the first part of the assertion. 

On the other hand, if \x = 0, then (fT7|) and Lemma [2.21 imply that 

(1 - E)\\V(gi -g)\\ 2 < \\Vg\\ 2 - ||V^|| 2 + Ml ||T(o - gi)\\ + E \\T(g - g^)\\ 2 . 

Thus (EES} holds for ||T(o - t)|| < s . □ 
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Corollary 3.5. Assume that the assumptions of Theorem \3.4\ are satisfied. Then we 
have, with the notation of Lemma \3.1\ the estimates 

(1 " r(£ g*)/fi \\& ~9%< (7 + %t " )2 + N|(7 + SD,(a, 6, 7 )) + 



in the case \i > 0, and 

(i-^)liv(^-^)|| 2 

^ (7 + % f ll) 2 , „, II | FM ^, n( c u , (1HI + sEfa 
a 4 

in the case \i = 0. In particular, we have in both cases with a parameter choice 
a x max{<5, 7} a convergence rate 

||^ 7 -^||J = 0(max{5,7}). 

Remark 3.6. Consider for the moment the setting where the constraint set X is closed 
and convex. Then the convexity of X implies that r(£, g*) = whenever £ G N^(g^); in 
other words, the proximal normal cone N^(g^) coincides with the (usual) normal cone 
N x {gi) = {( : (C,g-g) < for all g G A"}. Thus in the condition T*u+( = 2(^-Ag^) 
for some £ G N^(g^) no smallness condition is required for £, and therefore this condition 
reduces to the classical projected source condition found in [TU1 124"] . 



Remark 3.7. The conditions and results of Theorem 13.41 and Corollary 13.51 can also be 
translated into the context of convex analysis with subgradients and Bregman distances 
(see [91 |20l [26]). Recall that the sub differential dTZ(g^) C X of a convex mapping 
71: X — > [0, +00] at g* consists of all elements (6l satisfying 11(g) > H(g') + (£,g—g) 
for all g G X. Moreover, the Bregman distance P^(-;gr') is defined as 

Vt(g;gi):=K{g)-K{g1)-(Z,g-g*). 

If 7e(#) := I^IIJ (setting 71(g) = +00 if g £ H 1 ^;^)), we obtain that the 
sub differential is non-empty if and only if d v g^ = on dQ. Moreover, in this case 
its unique element is the function 2(fig* — Apt). Finally, it is easy to see that the 
Bregman distance between with g and g>' with respect to || • ||^ is precisely \\g — g'W^- 

In this setting, Corollary 13.51 with /i > reads as follows: If there exist £ G dTZ(g^) 
and C G N^(g^) with r((,g^) < /i, then 

^ 7 )=0(max{5, 7 }). 



Note moreover that in [18j a theory based on abstract convex analysis has been 
developed in order to derive convergence rates for non-convex regularization terms. 
Again, the results of Corollary I3.5I can be seen as special cases of the results in [18j 
Section 4] by realizing that the function 2(fig^ — Ag^) — Q is a generalized subgradient 
of the mapping 



n(g) 



\g\\l ifgex, 



-00 else. 
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4. Extension to the stochastic setting 

In this section, we allow the approximation errors HT* 5 — T\\ and ||/i 7 — h\\ to be stochastic 
and depend on the sample size n. More precisely, T s is a nonparametric estimator of 
the operator T depending on the random sample (Yj,Xj, Wi)i=i r „ tn and we will denote 
it by T. Similarly, h 1 is a nonparametric estimator of the function h depending on the 
random sample {Yh -^-ii Wi)i=i,...,n and we will denote it by h . Finally, the approximated 
regularized solution g^ 1 will be denoted by g a . 

In the following, we will derive convergence rates in probability for g a . To that end, 
recall that a sequence of random variables Q n , n G N, in a normed space is bounded in 
probability, if for every e > there exists C > and n G N such that 



Note that an alternative to convergence rates in probability is the derivation of 
convergence rates in expectation, which has been carried out for Tikhonov regularization 
and generalizations in [HE]- In this paper, however, we will restrict ourselves to rates in 
probability in order to be able to exploit the results in [13J on unconstrained instrumental 
regression. 

Following [13] . we introduce the kernel approach with generalized kernel functions 
of order I for estimating T and h. Note that the kernel is considered in generalized form 
only to overcome edge effects. Let o = a n — > denote a bandwidth and K a (-, •) denote 
a univariate generalized kernel function with the properties K a (u,t) = if u > t or 
u<t-l; for all t G [0,1], 



We call K a (-, •) a univariate generalized kernel function of order I (see |23j). A special 

class of multivariate generalized kernel functions of order I is given by that of products 
of univariate generalized kernel functions of order I. Let K x , a and K Wa denote two 
generalized multivariate kernel functions of dimension k + 1 and Ky i(J a kernel function 
of dimension 1. First we estimate the density functions fyw, fxw and fw Note that, 
for simplicity of notation, we use the same bandwidth to estimate the three densities 



F(\\Q n \\>C)<e 



for all n > n . 





if j = 0, 

if 1 <j < I- 1. 





i=i 




i=i 
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1 n 

fw{w) = — ^^Kw^w -Wi,w). 



no" 

i=i 



Then the estimators of T and h are 

fip{w) = f ifj{x) ^ X Y^ ,W ^ dx, 
J Jw{w) 

J fw{w) 

In order to derive a rate of convergence for g a , we require 
Assumption 4.1. We assume that the following conditions are satisfied: 

(i) The data (Yj, Xi, Wi), i — 1, . . . , n, define an i.i.d. sample of (Y, X, W). 

(ii) The probability density function fyxw is / times continuously differentiable in the 
interior of Qy x x VLy/ and bounded away from zero on Qy x x Q w . 

(iii) The conditional expectation E(e 2 |W / = w) is uniformly bounded on Qw 

(iv) Both multivariate kernels Kx,a and Kw,a are product kernels generated from the 
univariate generalized kernel function K a with the following properties: 

(a) The kernel function K a is a generalized kernel function of order /. 

(b) For each t G [0,1], the function K a (a-,t) is supported on a set of the form 
[{t — l)/cr, t/a] fl K where K is a compact interval not depending on t and 

SU Pa>0,te[0,l],n£^l^(^,t)| < 00. 

(v) The bandwidth parameter satisfies o — > and (ncr 2fc+2 ) _1 log(n) — > 0. 



Proposition 4.2. Suppose Assumption ^ ■ 1\ holds. Let p = min{/,/c + l} > 2 and /i > 0. 

Lei 

# f G argmin{||0||J :Tg = h, g G Af}. 

and 

<? a G arguiin^^; T,h) : g <= X}. 

Assume that d v g^ = on dQ. Denote moreover by T*: L 2 (fl;M. k ) — > L 2 (Q;R k ) the 
adjoint ofT and let N^(g^) C L 2 (Q; IR fc+1 ) be the proximal normal cone to the set X at 
the point g^ . 

(i) Let ji > 0. Assume that there exist u G L 2 (Q;M k ) and ( G N^(g^) with r(£, g^) < fi 
such that 

2(/V -Ag*)=T*u + (. 
Then the estimate 

\\g a -9% = P ((1 + ^) ^ a +a2P + + °> + a) 

ao/ds. 
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and 



A\\\T\\D{T)- 1 + D{T)- 1 + l) 2 r(C,^) < 1, 



where A and D(T) are as in Lemma \2. 2i Then the estimate 



Vg a -Vg*f = P ( 




+ a p + a 



) 



holds. 



In particular, if 



a X n 2(fc+p+l) 



and 



(j X ^ 2(fc + p+l) ? 



then we obtain in both cases the rate 



9a~ 9 




P {n 2(fc+ P +i)). 



Proof. Note first that the assumption that the density fyxw is bounded away from zero 
implies that the operator T is bounded and satisfies Tc ^ for every constant function 
c. Moreover, in [13J the convergence rate result 



has been derived under Assumption 14.11 Together with the results of Lemma 13.11 
Theorem 13.41 and Corollary 13. 5[ this immediately proves the assertion in the case [i > 0. 

In the case fi = 0, note that the assumption on the behaviour of a and 
Proposition 12.41 imply that the regularized solutions g a converge in probability to g^ . 
Moreover, the convergence in probability of T to T implies that 1/(D(T) — \\T — T\\) = 
Op(l), and therefore, as a — > and l/{na 2k+2 ) — > 0, we obtain in the notation of 
Lemma [3. II the estimate 



Then the result follows again immediately from Lemma 13.11 Theorem 13.41 and 




D (a,||r-r||,||A-A||) = O P (l). 



Corollary 13.51 



□ 
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5. Conclusion 

In this paper, we have studied the problem of nonparametric regression in the presence of 
endogenous variables and additional non-convex shape constraints. The main motivation 
is the estimation of the consumer demand function, which, according to standard 
microeconomic theory, satisfies certain (non-linear) integrability conditions. We have 
used instruments in order to tackle the issue of endogeneity, which, in the case where 
the coupling between the instruments and the explanatory variables is weak (that is, 
only given by a density), leads to the solution of an ill-posed operator equation. 

We propose to solve the resulting inverse problem by (constrained) Tikhonov 
regularization using a weighted Sobolev norm as a regularization term. Because of the 
weak closedness of the constrained set in the Sobolev space, the regularization method 
is convergent. In addition, we have derived convergence rates under the additional 
assumption that the true solution satisfies a certain variational inequality, which is 
shown to hold if g^ satisfies a projected source condition. In contrast to the usual convex 
case, however, this condition is coupled with a smallness condition. The convergence 
rates are derived in both a deterministic and a stochastic setting. In the latter situation 
we have the additional problem that the correspondence between the instruments and 
the explanatory variables, and thus the operator itself, is not known exactly but has to 
be estimated in a first step. Here we propose to use a kernel estimator, which allows 
us to obtain rates in probability for the operator error in dependence of the number of 
measurements. 
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